Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DOC: Accessing files from a S3 bucket. #23639

Merged
merged 2 commits into from
Nov 14, 2018

Conversation

myles
Copy link
Contributor

@myles myles commented Nov 12, 2018

Add some documentation about accessing files from a remote S3 bucket in pandas.

Add some documetnation about accessing files from a remote S3 bucket in pandas. pandas-dev#12206
@myles myles changed the title Add Documentation for accessing files from S3. DOC: Accessing files from a S3 bucket. Nov 12, 2018
@codecov
Copy link

codecov bot commented Nov 12, 2018

Codecov Report

Merging #23639 into master will decrease coverage by <.01%.
The diff coverage is n/a.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #23639      +/-   ##
==========================================
- Coverage   92.24%   92.24%   -0.01%     
==========================================
  Files         161      161              
  Lines       51326    51314      -12     
==========================================
- Hits        47347    47335      -12     
  Misses       3979     3979
Flag Coverage Δ
#multiple 90.63% <ø> (-0.01%) ⬇️
#single 42.31% <ø> (-0.02%) ⬇️
Impacted Files Coverage Δ
pandas/io/common.py 70.54% <0%> (-0.23%) ⬇️
pandas/io/parquet.py 84.61% <0%> (-0.15%) ⬇️
pandas/io/html.py 91.2% <0%> (-0.06%) ⬇️
pandas/core/series.py 93.68% <0%> (-0.03%) ⬇️
pandas/io/json/json.py 93.09% <0%> (-0.02%) ⬇️
pandas/core/frame.py 97.02% <0%> (-0.01%) ⬇️
pandas/io/parsers.py 95.55% <0%> (-0.01%) ⬇️
pandas/tseries/offsets.py 97.07% <0%> (-0.01%) ⬇️
pandas/core/groupby/ops.py 96.79% <0%> (ø) ⬆️
pandas/plotting/_core.py 83.63% <0%> (ø) ⬆️
... and 4 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update b9ba708...21487b1. Read the comment docs.

Load a file from S3
-------------------

Pandas support loading files from a S3 bucket for remote file interactivity.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NIT: But I think this should read 'Pandas supports loading files'

@alimcmaster1
Copy link
Member

This looks good to me - CC. @jreback and @datapythonista

@datapythonista
Copy link
Member

We already have this: https://pandas.pydata.org/pandas-docs/stable/io.html#reading-remote-files

I'd expand that section if there is anything missing, I wouldn't expect many people to look in the cookbook for S3 options having a page specific for IO operations.

Does it make sense?

@datapythonista datapythonista added Docs IO Data IO issues that don't fit into a more specific label labels Nov 12, 2018
@myles
Copy link
Contributor Author

myles commented Nov 13, 2018

@datapythonista I've expanded the the section in io and remove it from cookbook.

Copy link
Member

@datapythonista datapythonista left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for updating @myles. That looks good to me, but I don't know much about S3, so will let other people review.

@jreback jreback added this to the 0.24.0 milestone Nov 14, 2018
@jreback jreback merged commit 6f8c6e1 into pandas-dev:master Nov 14, 2018
@jreback
Copy link
Contributor

jreback commented Nov 14, 2018

thanks!

JustinZhengBC pushed a commit to JustinZhengBC/pandas that referenced this pull request Nov 14, 2018
thoo added a commit to thoo/pandas that referenced this pull request Nov 15, 2018
* upstream/master: (25 commits)
  DOC: Delete trailing blank lines in docstrings. (pandas-dev#23651)
  DOC: Change release and whatsnew (pandas-dev#21599)
  DOC: Fix format of the See Also descriptions (pandas-dev#23654)
  DOC: update pandas.core.groupby.DataFrameGroupBy.resample docstring. (pandas-dev#20374)
  ENH: Allow export of mixed columns to Stata strl (pandas-dev#23692)
  CLN: Remove unnecessary code (pandas-dev#23696)
  Pin flake8-rst version (pandas-dev#23699)
  Implement _most_ of the EA interface for DTA/TDA (pandas-dev#23643)
  CI: raise clone depth limit on CI
  BUG: Fix Series/DataFrame.rank(pct=True) with more than 2**24 rows (pandas-dev#23688)
  REF: Move Excel names parameter handling to CSV (pandas-dev#23690)
  DOC: Accessing files from a S3 bucket. (pandas-dev#23639)
  Fix errorbar visualization (pandas-dev#23674)
  DOC: Surface / doc mangle_dupe_cols in read_excel (pandas-dev#23678)
  DOC: Update is_sparse docstring (pandas-dev#19983)
  BUG: Fix read_excel w/parse_cols & empty dataset (pandas-dev#23661)
  Add to_flat_index method to MultiIndex (pandas-dev#22866)
  CLN: Move to_excel to generic.py (pandas-dev#23656)
  TST: IntervalTree.get_loc_interval should return platform int (pandas-dev#23660)
  CI: Allow to compile docs with ipython 7.11 pandas-dev#22990 (pandas-dev#23655)
  ...
thoo added a commit to thoo/pandas that referenced this pull request Nov 15, 2018
…fixed

* upstream/master:
  DOC: Delete trailing blank lines in docstrings. (pandas-dev#23651)
  DOC: Change release and whatsnew (pandas-dev#21599)
  DOC: Fix format of the See Also descriptions (pandas-dev#23654)
  DOC: update pandas.core.groupby.DataFrameGroupBy.resample docstring. (pandas-dev#20374)
  ENH: Allow export of mixed columns to Stata strl (pandas-dev#23692)
  CLN: Remove unnecessary code (pandas-dev#23696)
  Pin flake8-rst version (pandas-dev#23699)
  Implement _most_ of the EA interface for DTA/TDA (pandas-dev#23643)
  CI: raise clone depth limit on CI
  BUG: Fix Series/DataFrame.rank(pct=True) with more than 2**24 rows (pandas-dev#23688)
  REF: Move Excel names parameter handling to CSV (pandas-dev#23690)
  DOC: Accessing files from a S3 bucket. (pandas-dev#23639)
  Fix errorbar visualization (pandas-dev#23674)
  DOC: Surface / doc mangle_dupe_cols in read_excel (pandas-dev#23678)
  DOC: Update is_sparse docstring (pandas-dev#19983)
  BUG: Fix read_excel w/parse_cols & empty dataset (pandas-dev#23661)
  Add to_flat_index method to MultiIndex (pandas-dev#22866)
  CLN: Move to_excel to generic.py (pandas-dev#23656)
  TST: IntervalTree.get_loc_interval should return platform int (pandas-dev#23660)
tm9k1 pushed a commit to tm9k1/pandas that referenced this pull request Nov 19, 2018
Pingviinituutti pushed a commit to Pingviinituutti/pandas that referenced this pull request Feb 28, 2019
Pingviinituutti pushed a commit to Pingviinituutti/pandas that referenced this pull request Feb 28, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Docs IO Data IO issues that don't fit into a more specific label
Projects
None yet
Development

Successfully merging this pull request may close these issues.

DOC: improve s3 access doc-strings / docs
4 participants